“Prosper Loan Data Exploration” Saeid Rostami ========================================================
## [1] 113937
## [1] 81
Prosper which was founded in 2005 is a peer-to-peer lending platform that people can invest in each other. Prosper connects people who need money with those who have money to invest. Prosper loan data contain 113,937 rows with 81 columns.
## [1] "ListingKey"
## [2] "ListingNumber"
## [3] "ListingCreationDate"
## [4] "CreditGrade"
## [5] "Term"
## [6] "LoanStatus"
## [7] "ClosedDate"
## [8] "BorrowerAPR"
## [9] "BorrowerRate"
## [10] "LenderYield"
## [11] "EstimatedEffectiveYield"
## [12] "EstimatedLoss"
## [13] "EstimatedReturn"
## [14] "ProsperRating..numeric."
## [15] "ProsperRating..Alpha."
## [16] "ProsperScore"
## [17] "ListingCategory..numeric."
## [18] "BorrowerState"
## [19] "Occupation"
## [20] "EmploymentStatus"
## [21] "EmploymentStatusDuration"
## [22] "IsBorrowerHomeowner"
## [23] "CurrentlyInGroup"
## [24] "GroupKey"
## [25] "DateCreditPulled"
## [26] "CreditScoreRangeLower"
## [27] "CreditScoreRangeUpper"
## [28] "FirstRecordedCreditLine"
## [29] "CurrentCreditLines"
## [30] "OpenCreditLines"
## [31] "TotalCreditLinespast7years"
## [32] "OpenRevolvingAccounts"
## [33] "OpenRevolvingMonthlyPayment"
## [34] "InquiriesLast6Months"
## [35] "TotalInquiries"
## [36] "CurrentDelinquencies"
## [37] "AmountDelinquent"
## [38] "DelinquenciesLast7Years"
## [39] "PublicRecordsLast10Years"
## [40] "PublicRecordsLast12Months"
## [41] "RevolvingCreditBalance"
## [42] "BankcardUtilization"
## [43] "AvailableBankcardCredit"
## [44] "TotalTrades"
## [45] "TradesNeverDelinquent..percentage."
## [46] "TradesOpenedLast6Months"
## [47] "DebtToIncomeRatio"
## [48] "IncomeRange"
## [49] "IncomeVerifiable"
## [50] "StatedMonthlyIncome"
## [51] "LoanKey"
## [52] "TotalProsperLoans"
## [53] "TotalProsperPaymentsBilled"
## [54] "OnTimeProsperPayments"
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"
## [57] "ProsperPrincipalBorrowed"
## [58] "ProsperPrincipalOutstanding"
## [59] "ScorexChangeAtTimeOfListing"
## [60] "LoanCurrentDaysDelinquent"
## [61] "LoanFirstDefaultedCycleNumber"
## [62] "LoanMonthsSinceOrigination"
## [63] "LoanNumber"
## [64] "LoanOriginalAmount"
## [65] "LoanOriginationDate"
## [66] "LoanOriginationQuarter"
## [67] "MemberKey"
## [68] "MonthlyLoanPayment"
## [69] "LP_CustomerPayments"
## [70] "LP_CustomerPrincipalPayments"
## [71] "LP_InterestandFees"
## [72] "LP_ServiceFees"
## [73] "LP_CollectionFees"
## [74] "LP_GrossPrincipalLoss"
## [75] "LP_NetPrincipalLoss"
## [76] "LP_NonPrincipalRecoverypayments"
## [77] "PercentFunded"
## [78] "Recommendations"
## [79] "InvestmentFromFriendsCount"
## [80] "InvestmentFromFriendsAmount"
## [81] "Investors"
## 'data.frame': 113937 obs. of 13 variables:
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years: int 0 1 0 0 0 0 0 1 0 0 ...
## $ DaysWithCreditLine : num 6242 8275 5953 13042 5380 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ Term : Factor w/ 3 levels "12","36","60": 2 2 2 2 2 3 2 2 2 2 ...
## $ ProsperRating : Factor w/ 7 levels "AA","A","B","C",..: NA 2 NA 2 5 3 6 4 1 1 ...
## $ ListingCreationDate : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ ListingCategory : Factor w/ 21 levels "Not available",..: 1 3 1 17 3 2 2 3 8 8 ...
## $ EmploymentStatus : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
## $ AnnualIncome : num 37000 73500 25000 34500 115000 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## DelinquenciesLast7Years PublicRecordsLast10Years DaysWithCreditLine
## Min. : 0.000 Min. : 0.0000 Min. : 2152
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 6818
## Median : 0.000 Median : 0.0000 Median : 8413
## Mean : 4.155 Mean : 0.3126 Mean : 8762
## 3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.:10392
## Max. :99.000 Max. :38.0000 Max. :26014
## NA's :990 NA's :697 NA's :697
## InquiriesLast6Months BorrowerRate Term ProsperRating
## Min. : 0.000 Min. :0.0000 12: 1614 C :18345
## 1st Qu.: 0.000 1st Qu.:0.1340 36:87778 B :15581
## Median : 1.000 Median :0.1840 60:24545 A :14551
## Mean : 1.435 Mean :0.1928 D :14274
## 3rd Qu.: 2.000 3rd Qu.:0.2500 E : 9795
## Max. :105.000 Max. :0.4975 (Other):12307
## NA's :697 NA's :29084
## ListingCreationDate LoanOriginalAmount
## 2013-10-02 17:20:16.550000000: 6 Min. : 1000
## 2013-08-28 20:31:41.107000000: 4 1st Qu.: 4000
## 2013-09-08 09:27:44.853000000: 4 Median : 6500
## 2013-12-06 05:43:13.830000000: 4 Mean : 8337
## 2013-12-06 11:44:58.283000000: 4 3rd Qu.:12000
## 2013-08-21 07:25:22.360000000: 3 Max. :35000
## (Other) :113912
## ListingCategory EmploymentStatus AnnualIncome
## Debt consolidation:58308 Employed :67322 Min. : 0
## Not available :16965 Full-time :26355 1st Qu.: 38404
## Other :10494 Self-employed: 6134 Median : 56000
## Home improvement : 7433 Not available: 5347 Mean : 67296
## Business : 7189 Other : 3806 3rd Qu.: 81900
## Auto : 2572 : 2255 Max. :21000035
## (Other) :10976 (Other) : 2718
## RevolvingCreditBalance
## Min. : 0
## 1st Qu.: 3121
## Median : 8549
## Mean : 17599
## 3rd Qu.: 19521
## Max. :1435667
## NA's :7604
From the figure above we can see the amont of money that people borrow and as we can see people tend to borrow in whole numbers.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
The minimum money that people borrowed is 1000 as opposed to maximum, which is 35000. the median and mean are 6500 and 8337 respectively.
Sounds like most people borrow to cover their debts.
As we can see most of the borrowers are employed.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 38404 56000 67296 81900 21000035
The majority of income are between 20000 and 90000 annually. The median is 56000 and the mean is 67296.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.000 0.000 4.155 3.000 99.000 990
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.0000 0.0000 0.3126 0.0000 38.0000 697
It is obvious that most of the borrower have zero or one deliquencies in the last 7 years. Similarly they have zero or one public records in the last 10 years.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 3121 8549 17599 19521 1435667 7604
Revolving Credit Balance is the total outstanding balance that the borrower owes on his/her credit accounts. The median and mean are 8549 and 17600 respectively and the most common amount is 0.
here is a credit line stors the credit story of 60 years.
The most common rating is C follows by B. A and D are at the next steps (excluding the NA).
Most loans have 36 months terms
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
The median and mean for the borrower rate are 18.4% and 19.28% respectively, and The maximum borrower rate is 0.4975 or 49.75%.
2013 is the year that people borrowed money more than any other years and 2009 is the minimum borrowing year. There can be many resons behind that like economic crises or elections.
With the increase of the length of employment there is a decreas in the number of people who borrow loans.
As we can see California is the state that people were more likely to loan and Florida, Illinois, New York and Texas are at the next steps.
For the purpose of this project I am using the Prosper data set, which contains all Prosper loans created until March 11th, 2014. There are discrete and continuous variables in this dataset.Each variable is a column and each row is an observation.
Other variables that help me in my investigation are Employment Duration, Debt To Income Ratio, Prosper Rating and Occupation.
I created following new variables during the analysis: ListingCretionYear and Days with credit line.
I set ListingCreationYear variable as a factor so when I plotted it would look discrete. I’ve alose taked care of ranked variables order in top 10 loan states.
Investors
As we can see larger loans have more investors.
Mean loans amounts wen up slowly from 2005 to 2007, then it decreased at 2008 and went down to his minimum value at 2009. After that it recovered and increased and peaked at 2013 and 2014.
## df$Term: 12
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 2000 3500 4694 5000 25000
## --------------------------------------------------------
## df$Term: 36
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 3000 5000 7276 10000 35000
## --------------------------------------------------------
## df$Term: 60
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2000 8000 11500 12370 15000 35000
The loan amonts are getting larger with longer terms. As we can see from the Table median and mean are increasing by longer terms.
It is obvious from the box pots that larger loans on average are related to larger incomes.
As we can see the amonth of debt is growing by increasing the number of credit lines.
If we look at the mean values we can see that the mean of Baby & Adoption and Dept Consolidation are the highest as opposed to Student and Vacation that are the minimums.
As we can see Employed people requested more loans. Interestingly, Not employed people wanted more loan than part-time employed people.
As BankcardUtilization, DeliquenciesLast7Years, PublicRecordsLast10Years and InquiriesLast6Months increaseswith the borrower rate,as opposed to RevolvingCreditBalance, whcih decreased. DaysWithCreditLine has no significant change with BorrowerRate.
## [1] 0.255482
## [1] -0.05960823
## [1] 0.1702787
## [1] 0.1283138
## [1] -0.0474466
## [1] 0.18381
None of that variebles has strong relatipnship with each other. The strongest relationship is between BorrowerRate with BankcardUtilization with 0.25.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 44200 67000 82658 97925 7422574
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 36000 54000 65289 80000 21000035
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 45000 64000 73465 90000 1305000
The median and mean for 12 months term are the highest between term of 12,36 and 60.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0100 0.1100 0.1700 0.2203 0.2800 10.0100 199
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.210 0.283 0.310 10.010 6953
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0100 0.1700 0.2300 0.2565 0.3200 10.0100 1402
As we can see the median of the DebtToIncomeRatio increases as the terms goes up.
As we can see The better rating belongs to lower borrower rate.
The median of 60 month term is the highest one, while the 12 month has the minimum one.
Tip: As before, summarize what you found in your bivariate explorations here. Use the questions below to guide your discussion.
The number of investors is increasing with higher prosper score, loan amount is bigger, borrowers have less existing prosper loans, estimated loss is lower. The mean loan amount vary trough years. The minimum mean in 2009 and the maximum one is in 2013 and 2014. We also can see for instance the borrower rate increases as debt to income ratio increases. Moreover, we can see that with bigger amonth of loan the term is also longer.
I noticed that employed people are more likely to loan than others. Also debt to income ratio for rating AA is the lowest one.
The strongest relationship that I found was between BorrowerRate with BankcardUtilization with value of 0.255482.
As we can see borrowers with higher income, which are home owners have lower borrower rates.
From the above we can see relationship between the amount borrowers were delinquent and the number of delinquencies they’ve had over the last 7 years then separated that by employment status. Is obvious that Employed and Full time are the maximum.
This plot shows the relationship between a lender yield on the loan and the number of investors with the duration of the loan and the prosper rating.
Above plot describes the risk category based on to the particular loan.It displays the progression from a safe area, green color, to a risky area, red color,.
Monthly income had positive and late payments has negative correlation with Rating. The number of investors is increasing when the prosper score is getting beter and loan amount is getting bigger too. Also we can see that higher loan amounts have longer term.
I’ve observed that employed people are more likely to borrow money.
No I did not.
Tip: You’ve done a lot of exploration and have built up an understanding of the structure of and relationships between the variables in your dataset. Here, you will select three plots from all of your previous exploration to present here as a summary of some of your most interesting findings. Make sure that you have refined your selected plots for good titling, axis labels (with units), and good aesthetic choices (e.g. color, transparency). After each plot, make sure you justify why you chose each plot by describing what it shows.
Tip: Here’s the final step! Reflect on the exploration you performed and the insights you found. What were some of the struggles that you went through? What went well? What was surprising? Make sure you include an insight into future work that could be done with the dataset.
Tip: Don’t forget to remove this, and the other Tip sections before saving your final work and knitting the final report!